<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>
  File: README
  
    &mdash; Documentation by YARD 0.8.7.3
  
</title>

  <link rel="stylesheet" href="css/style.css" type="text/css" charset="utf-8" />

  <link rel="stylesheet" href="css/common.css" type="text/css" charset="utf-8" />

<script type="text/javascript" charset="utf-8">
  hasFrames = window.top.frames.main ? true : false;
  relpath = '';
  framesUrl = "frames.html#!" + escape(window.location.href);
</script>


  <script type="text/javascript" charset="utf-8" src="js/jquery.js"></script>

  <script type="text/javascript" charset="utf-8" src="js/app.js"></script>


  </head>
  <body>
    <div id="header">
      <div id="menu">
  
    <a href="_index.html">Index</a> &raquo; 
    <span class="title">File: README</span>
  

  <div class="noframes"><span class="title">(</span><a href="." target="_top">no frames</a><span class="title">)</span></div>
</div>

      <div id="search">
  
    <a class="full_list_link" id="class_list_link"
        href="class_list.html">
      Class List
    </a>
  
    <a class="full_list_link" id="method_list_link"
        href="method_list.html">
      Method List
    </a>
  
    <a class="full_list_link" id="file_list_link"
        href="file_list.html">
      File List
    </a>
  
</div>
      <div class="clear"></div>
    </div>

    <iframe id="search_frame"></iframe>

    <div id="content"><div id='filecontents'>
<h1 id="label-QuantileEstimator.rb">QuantileEstimator.rb</h1>

<p>This implements the quantile estimator described in the paper:</p>

<p>Cormode et. al.:
&quot;Effective Computation of Biased Quantiles over Data
Streams&quot;</p>

<p>Given the different strategies to implement the algorithm I used the
easiest one by
using <a
href="https://github.com/odo/quantile_estimator">github.com/odo/quantile_estimator</a>
as reference for the implementation.</p>

<h2 id="label-Installation">Installation</h2>

<p>Add this line to your application&#39;s Gemfile:</p>

<pre class="code ruby"><code class="ruby"><span class='id identifier rubyid_gem'>gem</span> <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>quantile_estimator</span><span class='tstring_end'>&#39;</span></span></code></pre>

<p>And then execute:</p>

<pre class="code ruby"><code class="ruby">$ bundle</code></pre>

<p>Or install it yourself as:</p>

<pre class="code ruby"><code class="ruby">$ gem install quantile_estimator</code></pre>

<h2 id="label-Usage">Usage</h2>

<p>First initialize the estimator. Right now you can either have a
<code>Biased</code> or
<code>Targeted</code> invariants. The targeted
invariant let you select the quantiles you are
particularly interested and
will yield higher compression rates.</p>

<pre class="code ruby"><code class="ruby"><span class='id identifier rubyid_biased'>biased</span> <span class='op'>=</span> <span class='const'>Invariant</span><span class='op'>::</span><span class='const'>Biased</span><span class='period'>.</span><span class='id identifier rubyid_new'>new</span><span class='lparen'>(</span><span class='float'>0.004</span><span class='rparen'>)</span>
<span class='id identifier rubyid_targeted'>targeted</span> <span class='op'>=</span> <span class='const'>Invariant</span><span class='op'>::</span><span class='const'>Targeted</span><span class='period'>.</span><span class='id identifier rubyid_new'>new</span><span class='lparen'>(</span><span class='lbracket'>[</span>
                                    <span class='lbracket'>[</span><span class='float'>0.05</span><span class='comma'>,</span> <span class='float'>0.02</span><span class='rbracket'>]</span><span class='comma'>,</span>
                                    <span class='lbracket'>[</span><span class='float'>0.5</span><span class='comma'>,</span>  <span class='float'>0.02</span><span class='rbracket'>]</span><span class='comma'>,</span>
                                    <span class='lbracket'>[</span><span class='float'>0.95</span><span class='comma'>,</span> <span class='float'>0.02</span><span class='rbracket'>]</span>
                                   <span class='rbracket'>]</span><span class='rparen'>)</span>

<span class='id identifier rubyid_estimator'>estimator</span> <span class='op'>=</span> <span class='const'>Estimator</span><span class='period'>.</span><span class='id identifier rubyid_new'>new</span><span class='lparen'>(</span><span class='id identifier rubyid_targeted'>targeted</span><span class='rparen'>)</span></code></pre>

<p>Insertion of data is as simple as:</p>

<pre class="code ruby"><code class="ruby"><span class='id identifier rubyid_estimator'>estimator</span><span class='period'>.</span><span class='id identifier rubyid_insert'>insert</span><span class='lparen'>(</span><span class='id identifier rubyid_value'>value</span><span class='rparen'>)</span></code></pre>

<p>The insertion of data <em>doesn&#39;t</em> automatically compress it. To
compress the data just
call:</p>

<pre class="code ruby"><code class="ruby"><span class='id identifier rubyid_estimator'>estimator</span><span class='period'>.</span><span class='id identifier rubyid_compress!'>compress!</span></code></pre>

<p>Using these primitives you can build wraps to compress on every nth insert
as shown
in
<a
href="https://github.com/diegoeche/quantile_estimator.rb/blob/master/benchmark.rb">this
file</a></p>

<h2 id="label-Pretty+Graphs">Pretty Graphs</h2>

<p>Using a targeted invariant to keep track of the 0.05, 0.5 and 0.95
quantiles, a
uniform source of random values and compressing every 100
iterations. We get the
following behavior regarding the size of the
internal data structure:</p>

<p><img
src="https://raw.github.com/diegoeche/quantile_estimator.rb/master/doc/compression.png"
/></p>

<p>Running time behavior is not too bad. The following graph shows the cost
of
insertions in the estimator. The homogeneous layer of outlayers probably
corresponds
to the compression cycles, while the bottom line is the cost of
compression-less
insertions.</p>

<p><img
src="https://raw.github.com/diegoeche/quantile_estimator.rb/master/doc/time.png"
/></p>

<p>Different distributions, different invariants setups will have different
behaviors.</p>

<p>Check your real data before using this!</p>

<h2 id="label-Known+issues">Known issues</h2>

<p>The implementation is known not to be thread-safe, and little effort has
been done to
optimize it.</p>

<h2 id="label-Contributing">Contributing</h2>
<ol><li>
<p>Fork it</p>
</li><li>
<p>Create your feature branch (<code>git checkout -b my-new-feature</code>)</p>
</li><li>
<p>Commit your changes (<code>git commit -am &#39;Add some
feature&#39;</code>)</p>
</li><li>
<p>Push to the branch (<code>git push origin my-new-feature</code>)</p>
</li><li>
<p>Create new Pull Request</p>
</li></ol>
</div></div>

    <div id="footer">
  Generated on Fri Nov 15 15:39:43 2013 by
  <a href="http://yardoc.org" title="Yay! A Ruby Documentation Tool" target="_parent">yard</a>
  0.8.7.3 (ruby-2.0.0).
</div>

  </body>
</html>